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Estimates of point-to-point telephone traffic are required for the 
current and the long-range planning of the Bell System's Public 
Switched Network. Because of the potentially immense volume of data 
which must be processed, these estimates are typically based upon 
small samples of total traffic and, therefore, can have large statistical 
errors. In this paper, we develop a model for quantifying the accuracy 
of point-to-point traffic measurements as a function of sample size and 
traffic parameters. Together with a worth-of-data model, not described 
here, our results can be used to establish a cost-optimal sampling rate 
for point-to-point traffic measurement systems. However, our results 
have been used to establish 20 percent as an upper bound on a cost- 
optimal sampling rate for a usage measurement system and 10 percent 
for an attempt-only measurement system. We show, however, that the 
attempt-based estimate is, for sampling rates greater than about 2 
percent, less accurate than the usage-based estimate. We also show how 
the accuracy of point-to-point load estimates can be improved by 
employing a ratio-estimate which combines point-to-point and 
trunk-group measurements; however, in practical applications, we find 
that the improvement is not significant. 

I. INTRODUCTION 

Trunk-group and point-to-point traffic data systems provide the 
measurements of telephone traffic which are used for the current and 
the long-range planning of the Bell System's Public Switched Network. 
Trunk-group data systems provide estimates of the traffic offered to 
existing trunk groups. Normally, an estimate of trunk-group offered load 
is based upon a direct measurement of the average number of busy 
trunks, the average attempt count, and the average overflow count. 1 

Point-to-point traffic data systems provide estimates of the telephone 
traffic which originates at one and terminates at the other of a specific 
pair of network points not necessarily joined by a single trunk group; for 
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example, the end-office pair {A\,B\) of Fig. 1. In the trunk-provisioning 
process, estimates of point-to-point offered loads are required to plan 
for the introduction of new trunk groups and the rehoming of end-offices 
or tandems. In general, they are also used, as a supplement to trunk- 
group measurements, in the network disassembly process (the process 
that converts measured loads on trunk groups which receive overflow 
traffic to first-route loads) and in the network assembly process (the 
process which converts projected first-route loads to total offered loads). 
Moreover, with the possible introduction of dynamic traffic routing, our 
studies have shown that the trunk-provisioning process will require more 
extensive use of point-to-point data than is required in the present 
hierarchical fixed-routing network. 

Estimates of point-to-point offered loads cannot, in general, be derived 
from trunk group measurements since trunk groups typically carry more 
than one point-to-point load. Instead, estimates of point-to-point offered 
loads are derived from detailed records of the origin, destination, and, 
when available, holding times of individual calls. (When holding times 
are not available, a load estimate can be based upon an attempt count 
measurement together with an exogenous estimate of mean holding time; 
see Section 3.2.) 

To reduce the costs for recording and processing point-to-point data, 
most existing measurement systems have been designed to record only 
a small sample of total traffic. For example, the Centralized Message 
Data System (CMDS, see Section II) provides estimates of point-to-point 
loads derived from a 5-percent sample of all toll calls. But while sampling 
reduces the cost of providing point-to-point data, it also introduces 
statistical measurement errors that reduce the accuracy and, hence, the 
worth of the data. 
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Fig. 1 — An application of point-to-point data: planning new trunk groups. 
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In this paper, we develop a model for quantifying the accuracy of 
point-to-point traffic measurements as a function of sample size and 
traffic parameters. Together with a worth-of-data model 2 for quantifying 
the cost impact of data errors on the network provisioning process, this 
data accuracy model can be used to establish the trunk-engineering re- 
quirements for point-to-point data systems. 

In Section II, we describe how point-to-point loads are measured by 
CMDS and we develop a model for quantifying sampling error. While this 
is a specific example, the methods and results are directly applicable to 
other (existing and proposed) point-to-point traffic measurement sys- 
tems. In Section III we use our model to analyze three methods for es- 
timating point-to-point loads: one based upon a usage measurement, 
one upon an attempt count together with an exogenous estimate of 
holding time, and one upon a combination of point-to-point and 
trunk-group measurements. A summary is given in the last section, and 
the required statistical results are developed in Appendices A and B. 

II. POINT-TO-POINT MEASUREMENTS 

For toll traffic, the major source of point-to-point data is provided by 
the Centralized Message Data System. In this section, we describe the 
CMDS data base and model the various sources of error. 

2.1 The CMDS data base 

For every point-to-point traffic item (defined by originating and 
terminating end-office prefix codes), the CMDS data base provides an 
estimate of both the total number of calls and the associated usage (i.e., 
sum of holding times) for calls that originate during a time-consistent 
hour over 20 consecutive business days. 

These estimates are based upon a 5-percent sample of the total 
number of calls processed by the toll billing equipment in each Regional 
Accounting Office (RAO). Figure 2 illustrates the process. Automatic 
Message Accounting (ama) tapes are periodically shipped to a Regional 
Accounting Office where they are processed to produce sequential rec- 
ords of the origin, destination, and conversation time of individual calls. 
As these records are processed for customer billing, the record for every 
20th call is transmitted (in a batch mode) to the CMDS computer in 
Kansas City, where they are sorted and summarized to provide estimates 
of individual point-to-point loads. 

2.2 Sources of error 

Since estimates of point-to-point offered loads are based upon mea- 
surements made over several time-consistent hours, and since source 
loads are known to vary from day to day, our model will account for 
statistical errors due to both the finite measurement interval and day- 
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Fig. 2 — Centralized message data system. 

to-day load variation. 3 Furthermore, since the measurements are ob- 
tained from a 5-percent sample of total traffic, our model will also ac- 
count for variations in the sample size for individual point-pairs. That 
is, depending upon the position of calls in the sequence of message rec- 
ords (from which the 5-percent sample is obtained) the actual sample 
size for an individual point-pair can be more, or less, than 5 percent. In 
Section 2.3, we develop a model for quantifying these sources of error. 
The CMDS data base excludes toll traffic which is not billed. In addi- 
tion, of course, to blocked calls, CMDS also excludes call set-up and 
ringing time (for both completed and noncompleted calls), directory 
assistance calls, and official calls which are not detailed billed. Estimates 
of this nonbilled usage are, therefore, an additional source of error for 
CMDS-based load estimates. However, our studies have shown that this 
error is negligible in comparison with sampling error and, therefore, it 
will not be accounted for by our model. (Section 3.2 describes a method 
for estimating nonbilled usage.) 

2.3 Mathematical model 

Estimates of point-to-point offered loads are normally based upon 
measurements made over K disjoint time-consistent intervals I\,. . .,/#, 
each of length t (typically, K = 20 and t = 1 hour). We assume that the 
distribution of realized loads can be described by the model used by Hill 
and Neal 3 to explain the observed variation of trunk-group offered loads. 
Thus, during //, we assume that call arrivals are Poisson-distributed* 
with rate X, and that call-holding times are independent and exponen- 

* Point-to-point offered loads correspond to trunk-group first-offered (Poisson) loads; 
hence, it is appropriate to set the peakedness factor, z, of Ref. 3 to unity. 
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tially distributed with mean h. Furthermore, in accordance with the 
model for day-to-day load variation developed in Ref. 3, the loads a, = 
\ih, i = 1,. . .,K, are assumed to be independent and identically dis- 
tributed with mean a = \h and variance 

v d = max 0, 0.13a* - — I , (1) 

t/nj 

where <p is a parameter that describes the level of day-to-day variation. 
For engineering applications, we use = 1.5, 1.7, or 1.84, which are re- 
ferred to, respectively, as low, medium, or high day-to-day variation. For 
first-routed and point-to-point traffic, = 1.5 is usually appropriate. 

To model the sampling process, we assume that in the sequence of 
message records, each call associated with a given point-pair is included 
in the sample with the same probability p (for CMDS, p = 0.05); i.e., we 
assume a multinomial distribution for the numbers of sampled calls 
belonging to given point-pairs. (For CMDS, the actual distribution is more 
closely approximated by a hypergeometric distribution; however, since 
the number of calls belonging to a given point-pair is a small fraction of 
the total number of calls processed by an RAO, our simplifying as- 
sumption introduces no significant loss of accuracy.) 

Let Nj denote the number of arrivals during Ij and let hij be the 
holding time of the ith arrival in Ij. Then, with 6q = 1 if the ith call is 
included in the sample and zero otherwise, 

K Nj 

c = L L in (2) 

y=l«=l 

is the total number of sampled calls during J = SjLj Ij, and 

K Nj 

u = L £ hijbij (3) 

7=1 1=1 

is the corresponding usage. 

III. LOAD ESTIMATES 

In this section, we analyze three procedures for estimating point- 
to-point loads. The first estimate, d (1) , is based upon the usage mea- 
surement, u; the second estimate, d (2) , upon the attempt count, c; and 
the third estimate, d (3) , upon a combination of point-to-point and 
trunk-group measurements. (Although these do not exhaust the possible 
estimates, they do form the basis for analyzing more complex estimates; 
for example, an estimate of the offered load at 10 a.m. could be based 
upon a combination of the measured loads at 9, 10, and 11 a.m.) In each 
case, we use mean square error (MSE) to measure the accuracy of the load 
estimate, i.e., if d denotes an estimate of the mean offered load, then 

MSE(dj = E\d - a} 2 

= Var{d| + £ 2 |d-aj. (4) 
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3.1 Estimate 1 

Since u [eq. (3)] is a p -sample of usage over K intervals each of length 

t, 



id) = 



Kpt 

is an estimate of the corresponding average offered load a. 
In Appendix A, we show that d (1) is unbiased, i.e., 



E\dW} = a, 



and has variance 



hence, from eq. (4), 



Var|d<D} = ^ 



MSE|d< 1 )) = - 
K 



2a 



pt/h 



2a 



pt/h 



+ u d 



+ v d 



(5) 



(6) 



(7) 



(8) 



In (8), the first term \2a/pt/h\ represents the combined effects of the 
finite measurement interval and deviations from the average sample size. 
The second term \ud j is due to (day-to-day) variations in the source load. 
Of course, the factor K is due to averaging measurements over K inde- 
pendent intervals. 

Figure 3 displays the root-mean-square (RMS) error of d (1) (in percent 
of mean load) as a function of average offered load for sampling rates 
of 5 and 100 percent. The results for a 5-percent sample apply when the 
offered load is estimated using CMDS data, while those for a 100-percent 
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Fig. 3 — Sampling error vs offered load. 
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sample apply when the load estimate is based directly upon trunk-group 
measurements. As noted in Section 2.2, errors in estimates of nonbilled 
usage, associated with CMDS estimates, are negligible in comparison with 
sampling error. Also, in applying our results to trunk-group measure- 
ments, we assume that u [eq. (3)] adequately approximates the actual 
usage during the measurement interval /; i.e., we assume that the edge 
effects are negligible (see Ref. 3). Furthermore, our studies have shown 
that the additional variance caused by discretely sampling the usage with 
a 100-second-scan Traffic Usage Recorder is negligible when compared 
with the variance caused by day-to-day load variation. The results shown 
in Fig. 3 assume the standard measurement interval (t = 1 hour, K = 20), 
low day-to-day variation (<p = 1.5), and h = 250 seconds. 

Note that estimates based upon a 5-percent sample can have errors 
that are large relative to those based upon a 100-percent sample. For 
example, for an offered load of 5 erlangs (typical of base year prove-in 
loads for new high-usage trunk groups), the RMS error for a 5-percent 
sample is about 20 percent, compared with an RMS error of about 5 
percent for a 100-percent sample. Similarly, for an offered load of about 
15 erlangs (typical of loads offered to existing Long Lines high-usage 
trunk groups), an estimate based upon a 5-percent sample has an RMS 
error of about 12 percent, while for a 100-percent sample, the RMS error 
is about 4 percent. 

Figure 4 displays the percent RMS error of d (1) as a function of the 
sampling rate p for offered loads of 5 and 15 erlangs. The important 
result to note is that the statistical variability of d (1) does not decrease 
appreciably as the sampling rate is increased beyond about 20 percent. 
This occurs since the contribution of day-to-day load variation is inde- 
pendent of the sampling rate, and above a sampling rate of about 20 
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Fig. 4 — Sampling error vs sampling rate. 
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percent it becomes the dominant source of error. Of course, any im- 
provement in accuracy is significant if the associated benefits justify the 
increased cost for data collection and processing. However, using the 
worth-of-data model of Ref . 2, which quantifies the cost impact of data 
errors on the provisioning of direct final (i.e., nonalternate route) trunk 
groups, we have established 20 percent as an upper bound on a cost- 
optimal sampling rate. To establish an actual cost-optimal sampling rate, 
however, we require a worth-of-data model which applies to more general 
network configurations (i.e., alternate routing networks); such a model 
is currently being formulated. 

3.2 Estimate 2 

The usage-based estimate of offered load (i.e., Estimate 1) is derived 
from attempt count and holding-time measurements. In this section, 
we analyze an alternative estimate based upon an attempt count together 
with an exogenous estimate of the corresponding mean holding time. The 
data collection and processing costs for an attempt-based point-to-point 
data system are less than for a usage-based system; however, we will show 
that the load estimates are substantially less accurate. 

Let h (a constant) denote an estimate of the mean holding time h. 
Then, since c [eq. (2)] is the total number of sampled calls during /, c/Kpt 
is an estimate of the mean attempt rate X and, therefore, 

«•-£,* ,9, 

is an estimate of the average offered load a. 
In Appendix A we show that 

E[a^} = \fi (10) 

so that d (2) is biased whenever h 9^ h, and 

- A 2 If. 



(11) 



(12) 



hence, from (4), 

Clearly, MSE{d (2) } depends upon the error (h — h). In practice, the 
same estimate fi would be applied to a collection of point-pairs (e.g., all 
point-pairs within an operating company, or all point-pairs served by 
a common trunk group), and our studies have found that the corre- 
sponding distribution of errors {h — h) has a coefficient of variation of 
at least 20 percent. Accordingly, our numerical results will assume that 
n is in error by 20 percent. 
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Figure 5 displays the percent RMS error of d (2) as a function of 
sampling rate for an offered load of 5 erlangs. We assume the same nu- 
merical values for K, t, <t>, and h as in Fig. 3, and we assume hlh = 1.2. 
For purposes of comparison, Fig. 5 also displays the percent RMS error 
of d (1) , as given previously in Fig. 4. 

We draw two conclusions from the results shown in Fig. 5. First, 
whereas 20 percent is a reasonable upper bound on sampling rate for a 
usage-based measurement system, a sampling rate of about 10 percent 
is sufficient for an attempt-based system. Of course, if h were known to 
be in error by more (less) than 20 percent, a sampling rate of less (more) 
than 10 percent would be appropriate. But if we know only the coefficient 
of variation of the distribution of h (which we assume to be 20 percent), 
then the average value of MSE{d (2) }, with respect to this distribution, 
cannot be significantly reduced by increasing the sampling rate beyond 
10 percent. Second, we note that an estimate based upon measured usage 
is, for sampling rates greater than about 2 percent, more accurate than 
an estimate based upon an attempt count. (For sampling rates less than 
2 percent, the standard deviation of the measured holding time exceeds 
that of the estimate h; hence, <S (2) is relatively more accurate in this 

range.) 

In view of the above results, we conclude that usage measurements 
(when available) are preferable to attempt counts for estimating 
point-to-point loads. However, our studies have shown that the attempt 
count provides a more accurate basis for estimating CMDS nonbilled 
usage than does the measured (billed) usage. That is, with CMDS data, 
an estimate of the form & = (u + 0c)/Kpt is employed, where the first 
term \u/Kpt\ is an estimate of billed load and the second term [pc/Kpt\ 
is an estimate of nonbilled load. Thus, fr can be interpreted as an estimate 



■10 




ATTEMPT BASED a«2) 



USAGE BASED a" 1 



^ io 30 40 50 60 70 100 

PERCENT SAMPLING RATE 

Fig. 5— Comparison of attempt-based and usage-based offered-load estimates (offered 
load ■ 5 erlangs). 
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of an average nonbilled holding time per billed attempt. Furthermore, 
we have shown that a small additional improvement can be obtained by 
employing a load-dependent combination of u and c to estimate billed 
load. However, this additional improvement is not significant. 

3.3 Estimate 3 

In this section, we show how the statistical variability of point-to-point 
load estimates can be reduced by combining point-to-point and trunk- 
group measurements. The procedure we describe has been proposed as 
a means for improving the accuracy of CMDS-based load estimates; 
however, we show that the improvement is not significant. 

Consider a trunk group whose total offered load is the sum of N 
point-to-point first-offered loads. For the ith load, i — 1,. . .,N, let a, 
denote the mean load and let d, !1) be the (point-to-point) usage-based 
estimate of a,. Furthermore, let t denote the estimate of trunk-group 
offered load based upon trunk-group usage measurements and let A = 
SfL ! d} l) denote the corresponding estimate (for the same measurement 
interval) based upon point-to-point usage data. Since T is based upon 
a 100-percent sample, the difference (T — A) measures the sum of the 
errors relative to the realized loads in the individual estimates d, (1) . By 
assigning a fraction («;,) of this difference to the individual estimates 
&l x \ we obtain a new estimate of a,; i.e., 

al 3) = dj» + Wi (T - A). (13) 

In Appendix B, we show that an approximation to a minimum-vari- 
ance linear estimate of a; is obtained when wi = d\ l) /A. Thus, we have 
the ratio-estimate 

dW = \T. (14) 

A 

Since dj l) appears as a summand in A, the ratio T/A is negatively cor- 
related (or tends to vary inversely) with d/ 1 '. Physically, it is this negative 
correlation which makes d} 3) statistically less variable than d\ l) . 

By employing a first-order Taylor series approximation to d| 3) , we 
obtain in Appendix A the following approximations for the mean and 
variance of d} 3) : 

E{dW) - a t (15) 

and 

Var| " pi ^ |1 - / ' (1 - p)l+ ^'""- (16) 



where 



ft— ="- (17) 

N 
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is the fraction of the total load contributed by the ith point-pair. Also, 
since our results are not significantly affected by differences in mean 
holding times, we have assumed that each point-to-point load has the 
same mean holding-time, h. From (4), (15), and (16) we have 



MSE{dP}«-^-|l-/,(l-p)) + ^ l ; di . 
Kpt/h K 



(18) 



Figure 6 displays the percent RMS error of df 3) as a function of offered 
load for several values of the parameter /,-. We assume a sampling rate 
of 5 percent (the CMDS sampling rate) and the same numerical values 
for K, t, h, and </> as in Fig. 3. Note that d, !3) is more accurate than d/ 1 ' and 
that the relative difference in accuracy is a maximum when /,- equals one 
(since d| 3) = T when /,- = 1) and approaches zero as /,- approaches zero 
(since the variance of T/A approaches zero and, hence, d/ 3) approaches 
dp* as /, approaches zero). 

The results of Fig. 6 are perhaps more striking when viewed in terms 
of the reciprocal of/,-, which can be interpreted as the number (N') of 
equal-sized point-to-point loads corresponding to /;. That is, the relative 
difference in accuracy of d/ 1 ' and d' 3) is a rapidly decreasing function of 
N'\ for N' greater than 4, the relative difference is less than about 4 
percentage points. 

Typically, trunk groups carry a large number of point-to-point loads, 
each of which represents a small fraction (/, « 1) of total offered load. 
In this region, Fig. 6 shows that the use of trunk-group measurements 
provides only a small improvement in the quality of CMDS-based load 
estimates. Again, any improvement is significant if the associated ben- 
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Fig. 6 — Reduction in rms error afforded by combining point-to-point and trunk-base 
measurements. 
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efits justify the increased development and data processing costs. 
However, based upon the worth-of-data model of Ref. 2, we have con- 
cluded that employing trunk-group measurements will not significantly 
reduce the statistical errors associated with CMDS-based estimates of 
point-to-point loads. 

IV. SUMMARY AND CONCLUSIONS 

We have developed a model for quantifying the accuracy of point- 
to-point traffic measurements as a function of sampling rate and traffic 
parameters. Using this model, we have established 20 percent as an upper 
bound on a cost-optimal sampling rate for a usage-based measurement 
system and 10 percent for an attempt-based system. Furthermore, for 
sampling rates greater than a few percent and loads in the range of en- 
gineering interest, our results show that a usage-based load estimate is 
more accurate than an attempt-based load estimate. We also showed that 
the accuracy of (CMDS) load estimates could be improved by employing 
a ratio estimate that combines point-to-point and trunk-group mea- 
surements; however, in practical applications, the improvement is not 
significant. Our results, together with a worth-of-data model, 2 can be 
used to establish requirements for point-to-point traffic measurement 
systems. 

APPENDIX A 

Mean and Variance of Load Estimates 

A. 1 Estimate 1 

From eqs. (3) and (5) and Ref. 4, 



Kptj=i 



E 



N 



t hiMNj 



i=i 



(19) 



Since the hy and 5,, are independent, and since arrivals during Ij are 
Poisson-distributed with rate \j, we have 

E{E{£h u 6ij\Nj^=E\Njph\ 

= phE\E\Nj\\j\} 

= phE\\jt\ 

= phXt. (20) 

Substituting (20) into (19) gives 

E\dW\ = Xh 

= a. (21) 

Furthermore, since the measurements during each interval Ij are 
uncorrected, (3) and (5) give 
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VM|d(1,, ^| v Hl Myl 



(22) 



= E 


Var 




ij 5 


ij\Nj 






+ Var E 





From Ref. 4, 
Var £ /iy5y 



= E\Njh 2 p(2 - p)\ + Var\Njph) 

= Xth 2 p(2 -p)+ p*h*Vai[Nj\. (23) 

Again, given X;, Nj is Poisson -distributed; hence, Vai\Nj \\j\ m E\Nj \ \j] 
= Xjt. 
Thus it follows that 

Vtu[Nj} = E\VailNj\\j)) 

+ Vai\E\Nj\\j}\ 
= E\\jt\ + Vax\\jt\ 
= \t + t 2 Vai\\j}. 
Substituting (24) into (23) gives 



(24) 



Var 



N 



f huh = 2X£/i 2 p + p 2 t% 2 Var{\ 7 }. 



(25) 



i=i 



The offered load during fj is aj = Xjh; hence, v d = Var {a/} = 
h 2 Var\\j\. 

Thus, from (22) and (25), we have 



Var|d<i>) = - 



2a 



pt/h 



+ v d 



(26) 



We now develop an expression, which we require in Section A.3, for 

Cov{d( 1 ),d< 1 >| p= il, 
where 

I K Nj 



i\t j=\ j=i 



(27) 



corresponds to a sampling rate of 100 percent. Thus, from (5) and (27), 
we have 



Cov|d< 1 ),d<D| p=1 } = -— ^LCov 

K t p j '= i 



N 



L. hifiib X- h u\- 



i=\ 



i'=l 



(28) 



From Ref. 4, 



Covj £ /i t7 5 l7 , § hyl = e(Cov) £ fcgfy W> £ hvlty] 

t"=l 1=1 I 1=1 J-l 
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+ Cov IE 



Ni 



j=l 



thijdijlNj ,E thij\Nj 



t'=l 



. (29) 



We first expand 



Cov 
±E 



N 
I 



Ni 



£ hij&ij\Nj, ± hij\Nj 



i=l 



Ni 



Ni 



[(.£ M&M) (g fc«W)j -*{£ MvWj B { £ *»wj 

= Nj{p2h 2 + (Nj - l)ph 2 } - Nfph 2 

= Njph 2 . (30) 

Substituting (30) into (29) and using (24) gives 



Nt 



Ni 



Cov £ feyfy, f fey = ph*E\Nj\ + Cov\Nj P h, Njh) 
i=i i=i 

= ph 2 E\Nj\ + ph 2 VaT\Nj} 
= 2phat + pt 2 Vd- 

Thus, from (28) and (31), we have 



Cov{d(i),da)| p=1 } = ^j^ + l ; d 



(31) 
(32) 



A.2 Estimate 2 



Using expansions similar to those of A.l it follows, from (2) and (9), 
that 

E\d^ = -^ZE\E\^d ij \N j \ 
Kptj=i [ i=i 



(33) 



and 



Var{d 



(2) 



/ h \ 2 K Ni 



■ffl 



h\ 2 1 



1 (— 
K [pt/h 



+ Ud 



(34) 



A.3 Estimate 3 

An approximation for the mean and variance of d\ 3) is obtained by 
expanding the right-hand side of eq. (14) in a three-dimensional Taylor 
series about the point \E\t\, E\A\, E\a\ l) \\. To first order, this gives the 
approximations 



WPl-fj^WPI 



(35) 
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and 

Varjd< 3) } « VarldP') + /fVar{A} 

+ /fVarfT} - 2/,-Covld/ 1 ), A] 

+ 2/ l -Cov{df 1) , T) - 2/?Cov{A,t}, (36) 



where 



fi = 



a. 



(37) 



A' 

Since the trunk-group measurement T corresponds to a sampling rate 
of 100 percent (i.e., p = 1), we have 

f = A\ p=1 

N 

| P =i. (38) 






Hence, from (21), (35), and (38), it follows that 



E\&\ 



(.'iii 



a f . 



(39) 



We assume that the daily source loads (for different point-pairs) are 
uncorrected;* hence, the estimates d\ l) are uncorrelated. Furthermore, 
since our results are not significantly affected by differences in the mean 
holding times, we assume that each point-to-point load has the same 
mean holding time, h. Thus, we have 



Var{A} = £ VarjdP} 

;=i 

2cii 



Kit! 



+ Vdi 



pt/h 
Var(f} = Var{A| p=1 






2di 



and 



Cov{di 1) ,A} = Var|dP) 



2a, 



+ »*■}. 



K pt/h 
Also, from (32) and (38), 

Cov|dJ 1) ) 7 > ) = Cov|d) 1 U! 1) |p=: 
1 r2a; 



(41) 



(42) 



(43) 



(44) 



* We have shown that our results are independent of the covariance structure of the daily 
source loads; for simplicity, we assume that they are uncorrelated. 
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and 

Cov\A,t \ = £ Cov{dP, f } 

Combining (26), (36), and (41) through (45) gives 



(45) 



1 



K 



2di 1 , 2(1 -p)" 4(1 -p) 



Var{d< 3 >} *> - -^j- + i7 A - + r"' ff Z aj - „ JJ fm 



pt/h Ul \ Kpt/h "y=i ' Kpt/h 



or, since /, 2^= x a, = a,, 



Var{aP) * -^- {1 - fi(l - p)\ + \ v di . (46) 

Kpt/h K 



APPENDIX B 

Minimum Variance Estimate 

In this appendix, we show that Estimate 3 can be obtained as an ap- 
proximation to a minimum variance linear estimate. Thus, from eq. 
(13) 

d/ 3) = d\ l) + Wi \t - A). (47) 

This estimate is unbiased, i.e., E\dj 3) \ = £{d{ u ) = a;, and has variance 

Var|d| 3) } = Varjd/ 1 '} + 2u> l Cov{d/ 1) , f-A\ + wfVtu[T - A}. (48) 

The value of W{ which minimizes the variance satisfies the equation 

d Var{d< 3) } 



which implies that 



= 0, (49) 



Cow\dj l \A-f\ .... 

*" Varjt-A} * (50) 

From Appendix A, it follows that 

Wi = -^-. (51) 

N 

y'=i 
Now if a, is estimated by d\ l) so that wi is estimated by d\ l) /A, eq. (47) 
becomes 

dW = \T. (52) 

A 

Q.E.D. 
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